[Spec Decode][Benchmark] Add Spec Bench Dataset for benchmarking #23563

ekagra-ranjan · 2025-08-25T15:04:13Z

Adds Spec Bench Dataset https://github.com/hemingkx/Spec-Bench to vLLM benchmark suite.
Can benchmark on a specific category only, e.g., --spec-bench-category "summarization".
It leverages CustomDataset sampling to reduce code.
Setting num prompt <=0 in dataset that inherit CustomDataset will load all the data

Test

cmd
time VLLM_USE_HYBRID_MEM=0 VLLM_USE_V1=1 python3 examples/offline_inference/spec_decode.py --method eagle --num_spec_tokens 3 --tp 1 --dataset-name spec_bench --dataset-path "/host/vllm-cohere/data/spec_bench/question.jsonl" --num-prompts -1 --print-output

Output

--------------------------------------------------
--------------------------------------------------
total_num_output_tokens: 68355
num_drafts: 31214
num_draft_tokens: 93642
num_accepted_tokens: 37294
mean acceptance length: 2.19
--------------------------------------------------
acceptance at token 0: 0.66
acceptance at token 1: 0.36
acceptance at token 2: 0.17

cmd
time VLLM_USE_HYBRID_MEM=0 VLLM_USE_V1=1 python3 examples/offline_inference/spec_decode.py --method eagle --num_spec_tokens 3 --tp 1 --dataset-name spec_bench --dataset-path "/host/vllm-cohere/data/spec_bench/question.jsonl" --num-prompts -1 --print-output --spec-bench-category "summarization"

Output

--------------------------------------------------
--------------------------------------------------
total_num_output_tokens: 17626
num_drafts: 8433
num_draft_tokens: 25299
num_accepted_tokens: 9193
mean acceptance length: 2.09
--------------------------------------------------
acceptance at token 0: 0.65
acceptance at token 1: 0.31
acceptance at token 2: 0.13

cmd
time VLLM_USE_HYBRID_MEM=0 VLLM_USE_V1=1 python3 examples/offline_inference/spec_decode.py --method eagle --num_spec_tokens 3 --tp 1 --dataset-name spec_bench --dataset-path "/host/vllm-cohere/data/spec_bench/question.jsonl" --num-prompts -1 --print-output --spec-bench-category "math_reasoning"

Output

--------------------------------------------------
--------------------------------------------------
total_num_output_tokens: 12641
num_drafts: 4963
num_draft_tokens: 14889
num_accepted_tokens: 7701
mean acceptance length: 2.55
--------------------------------------------------
acceptance at token 0: 0.78
acceptance at token 1: 0.50
acceptance at token 2: 0.27

mergify · 2025-08-25T15:05:21Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ekagra-ranjan.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request adds support for the Spec Bench dataset, which is a valuable addition for benchmarking. The implementation leverages the existing CustomDataset class effectively. However, I've identified two high-severity issues that should be addressed. First, a variable name for an argument group is reused, which is confusing and could lead to bugs. Second, the load_data method is called twice when initializing a SpecBench object, leading to unnecessary overhead. The provided code suggestions aim to fix these issues.

vllm/benchmarks/datasets.py

mergify · 2025-08-26T16:00:13Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ekagra-ranjan.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

keyboardAnt · 2025-09-01T20:30:40Z

@RoyNissim, you might find this PR relevant to your recent efforts in advancing and standardizing benchmarking.

ywang96

Thanks for the contribution - Could you also create a section under https://github.com/vllm-project/vllm/blob/main/benchmarks/README.md specifically for spec decode if we're going to have multiple benchmark datasets under this category? (This can be in a follow-up PR)

Signed-off-by: Ekagra Ranjan <[email protected]>

mergify · 2025-09-05T19:12:03Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ekagra-ranjan.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Ekagra Ranjan <[email protected]>

…m-project#23563) Signed-off-by: Ekagra Ranjan <[email protected]>

…m-project#23563) Signed-off-by: Ekagra Ranjan <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

…m-project#23563) Signed-off-by: Ekagra Ranjan <[email protected]>

…m-project#23563) Signed-off-by: Ekagra Ranjan <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

mergify bot added the performance Performance-related issues label Aug 25, 2025

mergify bot added the needs-rebase label Aug 25, 2025

gemini-code-assist bot reviewed Aug 25, 2025

View reviewed changes

vllm/benchmarks/datasets.py Outdated Show resolved Hide resolved

ekagra-ranjan mentioned this pull request Jul 21, 2025

[Benchmark][V1][Spec Decode][EAGLE] Tracking benchmark for V1 EAGLE #17812

Open

mergify bot removed the needs-rebase label Aug 25, 2025

mergify bot added the needs-rebase label Aug 26, 2025

mergify bot removed the needs-rebase label Sep 3, 2025

ekagra-ranjan changed the title ~~[Spec Dec][Benchmark] Add Spec Bench Dataset for benchmarking~~ [Spec Decode][Benchmark] Add Spec Bench Dataset for benchmarking Sep 3, 2025

ywang96 approved these changes Sep 3, 2025

View reviewed changes

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 3, 2025

ywang96 enabled auto-merge (squash) September 5, 2025 16:52

add spec benc

d562f98

Signed-off-by: Ekagra Ranjan <[email protected]>

auto-merge was automatically disabled September 5, 2025 19:10
Head branch was pushed to by a user without write access

ekagra-ranjan force-pushed the er-spec-bench branch from 5749f55 to d562f98 Compare September 5, 2025 19:10

mergify bot added the needs-rebase label Sep 5, 2025

Merge branch 'main' into er-spec-bench

b056c55

Signed-off-by: Ekagra Ranjan <[email protected]>

mergify bot removed the needs-rebase label Sep 5, 2025

Merge branch 'main' into er-spec-bench

7eb8926

ekagra-ranjan mentioned this pull request Sep 8, 2025

[Benchmark] Update bench doc with mtbench, blazedit, spec bench #24450

Merged

ywang96 merged commit 3feeeb9 into vllm-project:main Sep 8, 2025
38 checks passed

eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025

[Spec Decode][Benchmark] Add Spec Bench Dataset for benchmarking (vll…

b4f1752

…m-project#23563) Signed-off-by: Ekagra Ranjan <[email protected]>

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

[Spec Decode][Benchmark] Add Spec Bench Dataset for benchmarking (vll…

815951b

…m-project#23563) Signed-off-by: Ekagra Ranjan <[email protected]>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Spec Decode][Benchmark] Add Spec Bench Dataset for benchmarking (vll…

20b9ca1

…m-project#23563) Signed-off-by: Ekagra Ranjan <[email protected]>

sducouedic pushed a commit to sducouedic/vllm that referenced this pull request Oct 16, 2025

[Spec Decode][Benchmark] Add Spec Bench Dataset for benchmarking (vll…

b091376

…m-project#23563) Signed-off-by: Ekagra Ranjan <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[Spec Decode][Benchmark] Add Spec Bench Dataset for benchmarking #23563

[Spec Decode][Benchmark] Add Spec Bench Dataset for benchmarking #23563

Uh oh!

ekagra-ranjan commented Aug 25, 2025 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Aug 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

mergify bot commented Aug 26, 2025

Uh oh!

keyboardAnt commented Sep 1, 2025

Uh oh!

ywang96 left a comment

Uh oh!

mergify bot commented Sep 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Uh oh!

[Spec Decode][Benchmark] Add Spec Bench Dataset for benchmarking #23563

[Spec Decode][Benchmark] Add Spec Bench Dataset for benchmarking #23563

Uh oh!

Conversation

ekagra-ranjan commented Aug 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Aug 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify bot commented Aug 26, 2025

Uh oh!

keyboardAnt commented Sep 1, 2025

Uh oh!

ywang96 left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Sep 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ekagra-ranjan commented Aug 25, 2025 •

edited by github-actions bot

Loading